Improved Generative Semisupervised Learning Based on Finely Grained Component-Conditional Class Labeling
نویسندگان
چکیده
We introduce new generative semisupervised mixtures with more fine-grained class label generation mechanisms than in previous works. Our models combine advantages of semisupervised mixtures, which achieve label extrapolation over a component, and nearest-neighbor (NN)/nearest-prototype (NP) classification, which achieve accurate classification in the vicinity of labeled samples. We propose several two-stage stochastic data generation mechanisms which involve first generating unlabeled data, then generating each labeled sample, not directly according to a component density, but rather based on unla-beled samples that were generated according to the component density. Variants of our method differ by whether both the samples and labels, or just the labels, are generated based on unlabeled samples. Our generation mechanisms entail more complicated (albeit still exact) E-step 1 evaluations than for standard mixtures. These form the basis for generalized EM algorithms for learning. Our models are advantageous when within-component class proportions are not constant over the feature space region " owned by " a component. Experiments on UC Irvine data sets demonstrate consistent gains in classification accuracy, compared with previous semisupervised mixtures. For clustering and density estimation, our methods outperform previous semisupervised approaches, but only outperform unsupervised mixture learning when there is both significant true component overlap and sufficient labeled examples to disambiguate components.
منابع مشابه
Recognizing Named Entities in Tweets
The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect globa...
متن کاملSemisupervised Classifier Evaluation and Recalibration
How many labeled examples are needed to estimate a classifier’s performance on a new dataset? We study the case where data is plentiful, but labels are expensive. We show that by making a few reasonable assumptions on the structure of the data, it is possible to estimate performance curves, with confidence bounds, using a small number of ground truth labels. Our approach, which we call Semisupe...
متن کاملGenerative and Discriminative Learning with Unknown Labeling Bias
We apply robust Bayesian decision theory to improve both generative and discriminative learners under bias in class proportions in labeled training data, when the true class proportions are unknown. For the generative case, we derive an entropybased weighting that maximizes expected log likelihood under the worst-case true class proportions. For the discriminative case, we derive a multinomial ...
متن کاملSEVEN: Deep Semi-supervised Verification Networks
Verification determines whether two samples belong to the same class or not, and has important applications such as face and fingerprint verification, where thousands or millions of categories are present but each category has scarce labeled examples, presenting two major challenges for existing deep learning models. We propose a deep semisupervised model named SEmi-supervised VErification Netw...
متن کاملInverting VAEs for Improved Generative Accuracy
Recent advances in semi-supervised learning with deep generative models have shown promise in generalizing from small labeled datasets (xl,yl) to large unlabeled ones (xu). When the codomain (y) has known structure, a large unfeatured dataset (yu) is potentially available. We develop a parameter-efficient, deep semisupervised generative model for the purpose of exploiting this untapped data sou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neural Computation
دوره 24 شماره
صفحات -
تاریخ انتشار 2012